803 research outputs found

    Incremental elasticity for array databases

    Get PDF
    Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data

    LINVIEW: Incremental View Maintenance for Complex Analytical Queries

    Full text link
    Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

    Scalable data management in distributed information systems

    Full text link
    [EN] In the era of cloud computing and huge information systems, distributed applications should manage dynamic workloads; i.e., the amount of client requests per time unit may vary frequently and servers should rapidly adapt their computing efforts to those workloads. Cloud systems provide a solid basis for this kind of applications but most of the traditional relational database systems are unprepared to scale up with this kind of distributed systems. This paper surveys different techniques being used in modern SQL, NoSQL and NewSQL systems in order to increase the scalability and adaptability in the management of persistent data. © 2011 Springer-Verlag.This work has been supported by EU FEDER and Spanish MICINN under research grants TIN2009-14460-C03-01 and TIN2010-17193PallardĂł Lozoya, MR.; Esparza Peidro, J.; GarcĂ­a Escriva, JR.; Decker, H.; Muñoz EscoĂ­, FD. (2011). Scalable data management in distributed information systems. Lecture Notes in Computer Science. 7046:208-217. https://doi.org/10.1007/978-3-642-25126-9_31S2082177046Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: 3rd Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 132–141 (2007)Finkelstein, S., Jacobs, D., Brendle, R.: Principles for inconsistency. In: 4th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA (2009)Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. In: 7th Symp. on Operat. Syst. Design and Implem. (OSDI), pp. 205–218. USENIX Assoc., Seattle (2006)Cooper, B.F., Baldeschwieler, E., Fonseca, R., Kistler, J.J., Narayan, P.P.S., Neerdaels, C., Negrin, T., Ramakrishnan, R., Silberstein, A., Srivastava, U., Stata, R.: Building a cloud for Yahoo! IEEE Data Eng. Bull. 32, 36–43 (2009)DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: 21st ACM Symp. on Operat. Syst. Princ. (SOSP), Stevenson, Washington, USA, pp. 205–220 (2007)Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era (it’s time for a complete rewrite). In: 33rd Intnl. Conf. on Very Large Data Bases (VLDB), pp. 1150–1160. ACM Press, Vienna (2007)Lomet, D.B., Fekete, A., Weikum, G., Zwilling, M.J.: Unbundling transaction services in the cloud. In: 4th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA (2009)Campbell, D.G., Kakivaya, G., Ellis, N.: Extreme scale with full SQL language support in Microsoft SQL Azure. In: Intnl. Conf. on Mngmnt. of Data (SIGMOD), pp. 1021–1024. ACM, New York (2010)Levandoski, J.J., Lomet, D., Mokbel, M.F., Zhao, K.K.: Deuteronomy: Transaction support for cloud data. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 123–133 (2011)Helland, P., Campbell, D.: Building on quicksand. In: 4th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA (2009)Muñoz-EscoĂ­, F.D., GarcĂ­a-EscrivĂĄ, J.R., PallardĂł-Lozoya, M.R., Esparza-Peidro, J.: Managing scalable persistent data. Technical Report ITI-SIDI-2011/003, Instituto TecnolĂłgico de InformĂĄtica, Universitat PolitĂšcnica de ValĂšncia, Spain (2011)Agrawal, D., El Abbadi, A., Antony, S., Das, S.: Data management challenges in cloud computing infrastructures. In: 6th Intnl. Wshop. on Databases in Networked Information Systems (DNIS), Aizu-Wakamatsu, Japan, pp. 1–10 (2010)Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)Alonso, G., Kossmann, D., Roscoe, T.: SwissBox: An architecture for data processing appliances. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 32–37 (2011)Baker, J., Bond, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., LĂ©on, J.M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: Providing scalable, highly available storage for interactive services. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 223–234 (2011)Curino, C., Jones, E.P.C., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: A database-as-a-service for the cloud. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 235–240 (2011)Das, S., Agrawal, D., El Abbadi, A.: ElasTraS: An elastic transactional data store in the cloud. CoRR abs/1008.3751 (2010)Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)Breitbart, Y., Korth, H.F.: Replication and consistency: being lazy helps sometimes. In: 16th ACM Symp. on Princ. of Database Syst., PODS 1997, pp. 173–184. ACM, New York (1997)Brantner, M., Florescu, D., Graf, D.A., Kossmann, D., Kraska, T.: Building a database on S3. In: Intnl. Conf. on Mngmnt. of Data (SIGMOD), pp. 251–264. ACM Press, Vancouver (2008)Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Operating Systems Review 44, 35–40 (2010)Burrows, M.: The Chubby lock service for loosely-coupled distributed systems. In: 7th Symp. on Operat. Syst. Design and Implem. (OSDI), pp. 335–350. USENIX Assoc., Seattle (2006)Junqueira, F.P., Reed, B.: The life and times of a ZooKeeper. In: 28th Annual ACM Symp. on Princ. of Distrib. Comp. (PODC), p. 4. ACM Press, Calgary (2009)MacCormick, J., Murphy, N., Najork, M., Thekkath, C.A., Zhou, L.: Boxwood: Abstractions as the foundation for storage infrastructure. In: 6th Simp. on Operat. Syst. Design and Impl. (OSDI), pp. 105–120. USENIX Assoc., San Francisco (2004)Stonebraker, M., Cattell, R.: Ten rules for scalable performance in ”simple operation” datastores. Commun. ACM 54, 72–80 (2011)Amazon Web Services LLC: Amazon SimpleDB (2011), http://aws.amazon.com/simpledb/Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16, 133–169 (1998)Bernstein, P.A., Reid, C.W., Das, S.: Hyder - a transactional record manager for shared flash. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 9–20 (2011)Bonnet, P., Bouganim, L.: Flash device support for database management. In: 5th Biennial Conf. on Innov. Data Syst. Research (CIDR), Asilomar, CA, USA, pp. 1–8 (2011)Microsoft Corp.: Windows Azure: Microsoft’s cloud services platform (2011), http://www.microsoft.com/windowsazure/VoltDB, Inc.: VoltDB technical overview: Next generation open-source SQL database with ACID for fast-scaling OLTP applications (2010), Downloadable from: http://voltdb.com/_pdf/VoltDBTechnicalOverviewWhitePaper.pd

    Student See Versus Student Do: A Comparative Study of Two Online Tutorials

    Get PDF
    This study examines the impact on student performance after interactive and non-interactive tutorials using a 2 × 2 treatment-control design. In an undergraduate management course, a control group watched a video tutorial while the treatment group received the same content using a dynamic tutorial. Both groups received the same quiz questions. Using effect size to determine magnitude of change, it was found that those in the treatment condition performed better than those in the control condition. Students were able to take the quiz up to two times. When examining for change in performance from attempt one to attempt two, the treatment group showed a greater magnitude of change. Students who consistently performed lowest on the quizzes outperformed all students in learning gains

    Functional pearl: a SQL to C compiler in 500 lines of code

    Get PDF
    We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code - a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives

    Class of Service in the High Performance Storage System

    Full text link
    Quality of service capabilities are commonly deployed in archival mass storage systems as one or more client-specified parameters to influence physical location of data in multi-level device hierarchies for performance or cost reasons. The capabilities of new high-performance storage architectures and the needs of data-intensive applications require better quality of service models for modern storage systems. HPSS, a new distributed, high-performance, scalable, storage system, uses a Class of Service (COS) structure to influence system behavior. The authors summarize the design objectives and functionality of HPSS and describes how COS defines a set of performance, media, and residency attributes assigned to storage objects managed by HPSS servers. COS definitions are used to provide appropriate behavior and service levels as requested (or demanded) by storage system clients. They compare the HPSS COS approach with other quality of service concepts and discuss alignment possibilities

    A grid-based infrastructure for distributed retrieval

    Get PDF
    In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ïŹeld of Earth Science

    Implementation of Multidimensional Databases with Document-Oriented NoSQL

    Get PDF
    International audienceNoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of data warehouses with document-oriented NoSQL systems. We propose mapping rules that transform the multidimensional data model to logical document-oriented models. We consider three different logical models and we use them to instantiate data warehouses. We focus on data loading, model-to-model conversion and OLAP cuboid computation
    • 

    corecore